6 research outputs found
FLAIR: Federated Learning Annotated Image Repository
Cross-device federated learning is an emerging machine learning (ML) paradigm
where a large population of devices collectively train an ML model while the
data remains on the devices. This research field has a unique set of practical
challenges, and to systematically make advances, new datasets curated to be
compatible with this paradigm are needed. Existing federated learning
benchmarks in the image domain do not accurately capture the scale and
heterogeneity of many real-world use cases. We introduce FLAIR, a challenging
large-scale annotated image dataset for multi-label classification suitable for
federated learning. FLAIR has 429,078 images from 51,414 Flickr users and
captures many of the intricacies typically encountered in federated learning,
such as heterogeneous user data and a long-tailed label distribution. We
implement multiple baselines in different learning setups for different tasks
on this dataset. We believe FLAIR can serve as a challenging benchmark for
advancing the state-of-the art in federated learning. Dataset access and the
code for the benchmark are available at
\url{https://github.com/apple/ml-flair}
Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices
Federated Learning (FL) is a technique to train models using data distributed
across devices. Differential Privacy (DP) provides a formal privacy guarantee
for sensitive data. Our goal is to train a large neural network language model
(NNLM) on compute-constrained devices while preserving privacy using FL and DP.
However, the DP-noise introduced to the model increases as the model size
grows, which often prevents convergence. We propose Partial Embedding Updates
(PEU), a novel technique to decrease noise by decreasing payload size.
Furthermore, we adopt Low Rank Adaptation (LoRA) and Noise Contrastive
Estimation (NCE) to reduce the memory demands of large models on
compute-constrained devices. This combination of techniques makes it possible
to train large-vocabulary language models while preserving accuracy and
privacy
Investigating the function of GroES with hard-to-fold proteins in vivo
The use of molecular chaperones can increase the yield of correctly folded proteins. This is especially needed in the expression of proteins non-native to the host organism. This study set out to investigate the function of the chaperone GroES; a component in the GroE-system. The function of this chaperone has only been studied alone in vitro. Here we lay ground to further studies on GroES and its ability to act alone in vivo. GroES was expressed from a plasmid and characterized through its potential to increase the amount of correctly folded proteins. Characterization was mainly done by fluorescence spectroscopy with hard-to-fold proteins linked to fluorescent probes. Results show a very clear increase in fluorescence for most of the substrate proteins tested, indicating that GroES has a significant role in the GroE-system and perhaps outside of it